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ABSTRACT 

Context. Inversion codes are numerical tools used for the inference of physical properties from the observations. Despite their success, 
the quality of current spectropolarimetric observations and those expected in the near future presents a challenge to current inversion 
codes. 

Aims. The pixel-by-pixel strategy of inverting spectropolarimetric data that we currently utilize needs to be surpassed and improved. 
The inverted physical parameters have to take into account the spatial correlation that is present in the data and that contains valuable 
physical information. 

Methods. We utilize the concept of sparsity or compressibility to develop an new generation of inversion codes for the Stokes 
parameters. The inversion code uses numerical optimization techniques based on the idea of proximal algorithms to impose sparsity. 
In so doing, we allow for the first time to exploit the presence of spatial correlation on the maps of physical parameters. Sparsity also 
regularizes the solution by reducing the number of unknowns. 

Results. We compare the results of the new inversion code with pixel-by-pixel inversions, demonstrating the increase in robustness 
of the solution. We also show how the method can easily compensate for the effect of the telescope point spread function, producing 
solutions with an enhanced contrast. 
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1. Introduction 


An inversion code is a computer program based on algorithms 
that allows the user to extract information about the parameters 
of a physical system from the interpretation of observable^ In 
solar physics, nonlinear inversion codes are routinely applied af¬ 
ter their development in the ’70s to get the thermodynamical and 
magnetic properties of different regions of the solar atmosphere 
from the interpretation of the Stokes parameters ( [Harvey et al.| 
1972[[Auer et al.|1977t|Skumanich & Lites|1987j ). 

One of the fundamental problems in general, and specifically 
for spectropolarimetric inversions, is that the relation between 
the observables and the physical parameters of interest is very 
convoluted. Despite the strong nonlinearity and nonlocality in¬ 
herent to the radiative transfer, it has been possible to develop 
some simple diagnostics during the past years: the line ratio tech- 
nique (|^nflo||1973||2010||2011|), th e center-of-gravity method 
( |Semel||1970| (Rees & Semel||1979| ) and calibration curves be¬ 
tween spectral line summaries and certain magnetic field proper¬ 
ties (e.g., |Lites et al.|20()^ (Martinez Fillet et al.|2011| for recent 
applications). 


^ The term inversion comes from the fact that, given a forward model 
defined by an operator O, one needs to apply the (nonlinear) inverse 
of the operator, to the observations to recover the model param¬ 
eters. From a strict point of view, it would be better to consider these 
algorithms as inference codes, although we stick with the classical term 
inversion to avoid confusion. 


The appearance of the first powerful computers allowed re¬ 
searchers to apply optimization techniques for nonlinear func¬ 
tions and use more elaborate models. A merit functioij^ is 
minimized with respect to the physical parameters defining the 
specific model. The first efforts (with the application of the com¬ 
puters available at that time) made use of the Milne-Eddington 
(ME) approximati on to analytically solve the radiative trans- 
fer equation (e.g., Harvey et al.[ 1972[ [Auer et aL][1977[ [Landi[ 


[Degl’Innocenti & Landolfi|2QQ4 ). Although the simplifying as 

sumptions that one needs to use when using the ME approx¬ 
imation may not be fully fulfilled in real solar plasmas, it is 
still one of the most widely used models, in part because of 
its simplicity. This simplicity leads to very fast inversion codes 
that can be applied to the large number of observations that we 
currently obtain. State-of-the-art inversion codes such as VEISV 
( [Borrero et al.[2007|[2010[ ), used for inferring magnetic field vec¬ 
tors from the Helioseismic and Magnetic Imager (HMI; onboard 


the Solar Dynamics Observatory) data, MILOS ( Orozco Suarez 
2007[ ) and MERLIN ( [Skumanich & Lites|1987|[Lites et al. 


et al. 


2007), currently applied to data from the Hinode spacecraft, or 


the codes based on look-up tables and the principal component 
analysis (PC A) decomposition ( [Rees et al.|2000[[Socas-Navarro 


[et al.pOOT) used for the inversion of THEMIS (Telescope heli- 
ographique pour T etude du magnetisme et des instabilites so- 
laires) data, are based on the ME approximation. 


^ That is a direct consequence of the assumption that the observations 
are corrupted with additive Gaussian noise. 


Article number, page 1 of 13 








































A&A proofs: manuscript no. waveletinversion 


The availability of more powerful computers in recent years 
allowed us to use more complex and more realistic models. One 
of the essential ingredients of this revolution was the applica- 
tion of the idea of response f unctions ( [Landi DeglTnnocenti| 


& Landi DeglTnnocenti 1977) to the inversion of Stokes pro¬ 


files with non-trivial depth stratifications of the physical quan¬ 
tities. The first representative of this family of codes was SIR 
(Stokes Inversion based on Response functions; |Ruiz Cobo &| 
del Toro Iniesta|199^ . Such evolution occured naturally at that 


time because observations were showing strong asymmetries 
in the Stokes profiles in magnetized regions. The explanation 
of such asymmetries requires the presence of grad ients along 
the line-of-sight (LOS) of the physical prop erties ([Solanki & 


Pahlke|[T9^ IGrossmann-Doerth et al.]|1988t ISolanki & Mon 


tavon|1993[|Sigwarth et al.|1999| Lopez Ariste 2002} Khomenko 


et al.|2003[|Martmez Gonzalez et al.|2008[|Viticchid & Sanchez 


Almeida|2011|). Another repr esentative of these inversions code 

is S PINOR ([Frutiger et al.||2000| ). Based on the same strat- 
egy, |Socas-Navarro et al.| p000|) developed the NICOLE code 


( Socas-Navarro et al. |2Qi4| ), capable of dealing with lines in 


NLTE (non-local thermodynamical equilibrium). This model has 
been mainly applied for the inversion of Ca ii infrared triplet 
lines, which are formed under strong NLTE conditions (e.g., 


Socas-Navarro et al.|2000t[de la Cruz Rodriguez et al.|2012| ). 

Another step forward in the field of inversion codes was car¬ 
ried out by [Asensio Ramos et al.| ( |2007| ) and [Asensio Ramos | 
( [2009| ), who introduced Bayesian inference for spectropolari- 
metric observations. This allows the user to obtain posterior 
probability distributions for any model parameter and their cor¬ 
relation with the remaining ones. This probabilistic inference is 
extremely powerful but requires a huge effort in terms of com¬ 
putational power. 

Arguably, the latest step in the field of inversion codes has 
been to compensate the effect of the telescope point spread func- 
tion (PSEj^that is present in the observed data. 


van Noort 


( 2012 ) 


implemented a forward approach, where synthetic spectra from 
the inversion are convolved spatially with the telescope PSE be¬ 
fore comparison with the observations. |Ruiz Cobo & Asensio] 


Ramos (2013) used a regularized deconvolution to compensate 


the observations before the inversions were performed. 

The former solves the full problem in which the spatial con¬ 
volution and the inversion are coupled in the same operator O, 
resulting in a very large-scale classical i nversion code. The use 
of a Levenberg-Marquardt (LM) scheme ( [Levenberg| 1 944} |Mar- 


|quardt|1963] ) with a compact PSE leads to a sparse Hessian ma¬ 
trix that largely facilitates the inversion by spatially coupling the 
convergence of adjacent pixels. On the contrary, the latter decou¬ 
ples the inversion. Eirst the observations are deconvolved by ap¬ 
plying a PC A regularization in the spectral domain and the appli¬ 
cation of a standard Richardson-Lucy ( |Richardson|| 1 972} |Lucy| 
|1974| ) maximum likelihood deconvolution for each eigenimage. 
Then, the resulting denoised and deconvolved data are inverted 
using any classical inversion code. Both approaches introduce 
different regularizations in the inversion problem and can lead to 
potentially different results. 

Also, in the former approach there is no need to consider a 
low-pass filtering or a regularization of the deconvolution, but 
the author reports cases of oscillatory patterns in the inferred 
model that need to be damped during the inversion. In prac¬ 
tice, the implementation of the second approach is easier, cleaner 
and less prone to the aforementioned oscillatory artifacts, but 


^ The telescope PSF smears the intensity of point sources spatially, 
mixing information from neighbouring features. 
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the PCA regularization can filter out features with low statistical 
weight in the dataset. These methods rely upon accurate knowl¬ 
edge of the PSP used in the inversion, which may not be always 
available or appli cable. _ 

We note that [Scharmer et al.| ( |2Q13| ) also combined data 
inversions with a regularized deconvolution to compensate 
ground-based observations f or straylight, believed to or iginate 
from small-scale aberrations ( [Lofdahl & Scharmer|2012| ). 

To take a new step forward in the field of inversion codes, 
in this paper we propose a new technique to spatially couple the 
parameters of the model, inspired on the idea of sparsity or com- 
pressib ility of data. This method allows to use a PSP in the same 
way as |van No'ort ( 2012| ) but it is not strictly needed for the spa¬ 
tial coupling. 


2. Sparsity regularization 

With the use of fast slit spectropolarimeters and also two- 
dimensional filterpolarimeters, we routinely have 2D maps of 
regions in the solar atmosphere with the four Stokes parameters 
observed at several points along one or several spectral lines. 
This rate of new high-quality 2D observations will increase in 
the future with the advent of bi-dimensional spectropolarimeters 
based on image sheers or optical fibers. The interpretation of 
these observations have been done in the past by assuming that 
all pixels are completely unrelated and applying the inversion 
codes in a pixel-by-pixel basis. After this pixel-by-pixel inver¬ 
sion, the spatial smoothness of the derived quantities is taken as 
an indication of the success of the inversions. Salt-and-pepper 
noise present in the inverted maps of physical parameters is an 
indication of problems: either the required information cannot 
be extracted from the Stokes profiles, or the inversions failed to 
converge to a good solution. 

It has become more and more evident that two-dimensional 
observations and the ensuing inversions are needed to fully un¬ 
derstand the physical processes in the solar atmosphere. This is 
the case even when one assumes local thermodynamical equi¬ 
librium (LTE), which relates radiation to the local properties of 
the plasma. The first repr esentatives of such ap proach are the 
already described codes of [van Noort| ( |2012] ) and [Ruiz Cobo &| 
[Asensio Ramos| ( [TQ13| ), where the PSE of the telescope couples 
the observed Stokes profiles of nearby pixels. 


2.1. The generalidea 

Here we develop the general idea of regularized inversion codes, 
with the regularization being based on the idea of sparsity. Spar¬ 
sity or compressibility idealizes the concept that the data can be 
projected to a parameter space where a reduced set of variables 
can be used to fully describe that dataset. The motivation for 
this regularization resides on a very simple observation. When 
one saves a continuum image as a raw file (for instance, a stan¬ 
dard 512x512 pixel image), the size of the file is roughly 1 MB 
(using 4 bytes per pixel). The same image compressed using a 
lossless file format reduces the size by a factor 3-4, while a lossy 
format can go further and increase the compression ratio to a 
factor ~ 10. The compression is possible, fundamentally, due to 
the existence of spatial correlation (properties like smoothness, 
structures in the image, ...) on the image. If appropriately ex¬ 
ploited, it is possible to predict the value of a certain pixel from 
the values of other pixels thus making it unnecessary to store the 
value of all pixels. 

Let us assume that we have observed the Stokes parameters 
on a 2D grid of A/pix = x Ny pixels for a set of wave- 
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length points that sample one or several spectral lines. We pro¬ 
pose a model atmosphere for every pixel to explain the obser¬ 
vations, each model atmosphere being defined with a set of A^par 
parameters, so that each pixel k is described with the vector of 
parameters = (pu, p 2 k, • • •, We can build the vector 

of parameters p that encode all the model parameters for all the 
pixels in the field-of-view, such that p = (pi,P2, • • It 

is possible to reorder this vector so that it is built by stacking 
the value of each physical parameter for all the pixels, so that 
P = {Pi.P 2 , • • • PA^par)’ where p/ encodes the value of a certain 
parameter (i.e., magnetic field inclination, Doppler velocity, etc.) 
as a 2D map. Note that the vectors p and p are equivalent except 
for the ordering. The aim of any inversion code is, then, to infer 
the full vector p from the observed Stokes profiles. As explained 
in the introduction, classical inversion codes deal with this prob¬ 
lem by inverting the observed Stokes profiles pixel by pixel. 

In our approach, instead of working directly on the real space 
of parameters, we use a linear transformation so that the i-th pa¬ 
rameter is given by: 

q; = W[p,]. (1) 

In the previous equation, W is a linear operator that transforms 
from the real space to the transformed space. Among many po¬ 
tential transformations, the most interesting in our context are 
the Fourier, wavelet or discrete cosine transforms. It is clear that 
by working on the transformed space and using an inversion 
code to infer the transformed parameters we have gained noth¬ 
ing. However, here we can impose a fundamental key ingredi- 
ent: the assumption that the transformed image is spars^in the 
transformed domain. In other words, if the appropriate transfor¬ 
mation W is used, many elements of the transformed image (that 
we term modes in the following for simplicity) are zero or very 
close to zero in absolute value. Sparsity, that is usually fulfilled 
in nature, has been shown to be an extremely powerful assump¬ 
tion. Techniques like compressed sensing ( [Candes et al.||2006 i 
Donoho|2006|) or exact m atrix completion from partial measure- 
ments ( [Candes & Recht||2009| ) strongly rely on the sparsity as¬ 
sumption. 


2.2. Testing for sparsity in physical parameters 


For the inversion that we describe in the following to work prop¬ 
erly, it is crucial to test for the sparsity of the maps of model 
parameters. To this end, we do ME pixel-by-pixel inversions of 
two maps observed with the Solar Optical Telescope/Spectro- 
Polarimeter SOT/SP ( |Lites et aT]|2001| ) aboard Hinode ( |Kosugi| 


et al. 20071. We choose to only invert the Fe i line at 6302.5 


A. The first map is a quiet Sun map extracted from the obser¬ 
vations analyzed by |Lites et al. ( |2008| ), which were obtained at 
disk center on 2007 March 10. The second map is a sunspot of 
the active region NOAA 10953, that was mapped at an average 
heliocentric angle of ^ = 12.8° on 2007 April 30. This inver¬ 
sion is carried out with a classical inversion code using the LM 
algorithm and no regularization whatsoever, but the inversion is 
repeated 10 times for each pixel using different initial values of 
the model parameters (which are randomized). The ME model 
depends on a set of assumptions: i) the atmosphere is plane- 
parallel, semi-infinite, and in local thermodynamic equilibrium, 
ii) the line opacity, Doppler width, and all line properties are con¬ 
stant with depth on the atmosphere, iii) the magnetic field vec- 


^ It is more precise to say that the physical parameters are compress¬ 
ible in the transformed domain, which means that the majority of coef¬ 
ficients in the transformed space are small. 


tor is constant with depth in the atmosphere, and iv) the Planck 
function is a linear function of the continuum optical depth 
so that Bt = Bo + B^Tc, where dr^ = -Kcds, where Kc is the 
continuum opacity. The ME model parameters of our code are: 
Doppler width of the line in wavelength units (AT), macroscopic 
bulk velocity (Vniac). the slope Bi and intercept of the source 
function, the ratio between the line and continuum absorption 
coefficients (p), the line damping parameter (a) and the mag¬ 
netic field vector parameterized by its modulus, inclination and 
azimuth with respect to a given reference direction (B, 6b and 
05 , respectively). 

Figures and display the effect of compressing some pa¬ 
rameters in the quiet Sun and the sunspot maps. Both maps con¬ 
tain 256x256 pixels and the selected parameters are the mag¬ 
netic field strength, Doppler velocity and width of the line. Each 
panel shows the reconstruction of the original image (the right¬ 
most image in each panel) when the image is transformed using 
a linear transform, thresholded leaving only a certain percentage 
of the largest coefficients and transformed back. The percentage 
of non-zero coefficients is shown as a label in each column. In 
this example, we use the Daubec hies -8 (db 8 ) orthogonal wavelet 
( Jensen & la Cour-Harbo|[ 200 T] ) as a sparsity enhancing trans¬ 
form, which is appropriate for smooth maps, although the results 
are similar if other transforms are used. The figure demonstrates 
that one gets a very well reproduction of the original maps when 
the number of non-zero coefficients is slightly larger than 15%. 
This compression can be obtained because the linear transfor¬ 
mations capture part of the spatial correlation in the image. As a 
consequence, the value of ~85% of the pixels can be predicted 
from the value of just 15% of the image plus the presence of 
spatial correlation. To help in this comparison. Fig. displays 
the histogram of the difference between the original magnetic 
field strength, Doppler velocity and width of the line and the 
compressed one for different levels of compression. In this case, 
apart from the Daubechies -8 orthogonal wavelet transform, we 
also introduce the Haar (dbl) orthonormal wavelet (equivalen t 
to the Daubechies-1 wavelet; [Jensen & la Cour-Harbo|[2QQT] ), 
that is a discontinuous wavelet appropriate for representing effi¬ 
ciently maps with spatial discontinuities and the discrete cosine 
transform (DCT), appropriate for periodic signals. At the light 
of these results, keeping 10-15% of the coefficients seems to re¬ 
produce very well the original maps of parameters. When the 
fraction of remaining coefficients is made too small, more arti¬ 
facts appear and the difference between the original maps and 
the compressed ones are too large. 

One of the consequences of keeping only a certain percent¬ 
age of the coefficients in the transformed domain is that we loose 
some information (lossy compression). It is important that the 
user of the inversion code described in this paper selects appro¬ 
priate threshold limits so that no important information is lost 
in the final maps. Although we recognize the presence of these 
additional parameters to tweak, standard inversion codes are not 
easy to use and are also full of parameters to tweak. At the end, 
all these parameters can be tuned during an initial trial-and-error 
phase and fixed afterwards. 


2.3. Advantages and disadvantages 

The inversion code that we describe in the following (together 
with the numerical techniques needed to enforce sparsity) infers 
the physical parameters in the transformed space. At the end, the 
physical parameter of interest is obtained by just a final applica¬ 
tion of the inverse transform. 
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Fig. 1. Test for the compressibility of the parameters inferred in a quiet Sun map. Each column indicates the number of non-zero coefficients of 
the original map that are retained. 


Working on the transformed domain and applying a sparsity 
regularization leads to the following obvious advantages: 


- Imposing that the solution to the inversion has to be sparse 
introduces a strong regularization because the number of un¬ 
knowns is heavily reduced. From the potential N^^rNxNy to 
only sNpsxNxNy, with s ^ 1. We have found that typical 
values of the sparsity 5* will be around 20-30%. Given that 
a large fraction of the coefficients can be set to zero with¬ 
out degrading the solution, the number of free variables that 
we are currently using in pixel-by-pixel inversions is highly 
overestimated. An overly large number of degrees of free- 


nonphysical fluctuations of the physical 

- The inverse linear transformation gives the value of a physi¬ 
cal parameters at a specific observed pixel as a linear combi¬ 
nation of all modes in the map. This global character induces 
changes in all the pixels of the original space simultaneously 
when perturbing the value of a single mode. Reversing the 
argument, the observed Stokes profiles of a single pixel pro¬ 
vide a little amount of information to all modes simultane¬ 
ously. This large redundancy is a very interesting advantage 
of our approach and introduces a strong regularization of the 
solution. This global character also introduces strong regu- 

^ We note that the response functions must be computed for every pa¬ 
rameter, despite the assumption of sparsity, to assess which coefficients 
are negligible. 


dom can produce 
parameter^ 
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Fig. 2. Same as Fig. [^but for the sunspot data. We note that these images correspond to the solution of the pixel-by-pixel inversion and contain 
some artifacts. 


larization due to the presence of spatial correlation in the ob¬ 
servables and the physical parameters. 

The main problem of the method that we present here is that 
a suitable linear transform has to be chosen a-priori. The wavelet 
transform is sufficiently versatile so that it can highly compress 
the 2D maps of many of the physical parameters needed for the 
synthesis of spectral lines. For this reason, our suggestion is to 
use the same transform W for all the parameters. However, the 
method explained in the following allows the user to choose a 
different linear operator W for every parameter p/. This way, it 
is possible to design a set of linear transforms that leads to a very 
sparse representation of all the needed physical parameters. This 
additional step that we need to give as compared with standard 
pixel-by-pixel inversions is the one that introduces the spatial 
regularization, though. 


2.4. Merit function 

All standard inversion codes work by optimizing a merit func¬ 
tion that measures the ^ 2 -norm of the residuals with respect to the 
vector p of physical parameters. If the observed Stokes parame¬ 
ters are corrupted with Gaussian noise with diagonal covariance 
matrix, the merit function for the whole map is: 


Tp = 


1 






( 2 ) 


k=l i=l j=l 


ijk 


where S(Ty, p^) = (/(Ty, p^), Q{Aj, p^), U{Aj, p^), V{Aj, p^)) 

refers to the synthetic Stokes vector at wavelength position j 
and position k. Likewise, 0{Aj, k) is the observed Stokes vec¬ 
tor at this very same wavelength and for pixel k. The symbol 
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Fig. 3. Histograms of the difference between the original parameters and the compressed ones at different levels (indicated in each column). The 
upper panels show the results for the quiet Sun map, while the lower panels refer to the sunspot data. Each histogram refers to a different linear 
transformation. 


(Tijk stands for the standard deviation of the noise at wavelength position /, for the Stokes parameter j and pixel k, while w/ is the 
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weight associated to each Stokes parameter (that we assume, for 
simplicity, the same for all wavelengths and pixels). This weight 
is introduced for technical reasons to help improve the conver¬ 
gence during the optimizatioij^ Classically, it is customary to 
solve the problem 

argmin;^-^, (3) 

P 

by direct application of the Levenberg-Marquardt algorithm, 
which is specially suited to the optimization of such ^ 2 -norms. 
This involves an optimization for all parameters of all pixels si¬ 
multaneously. Note that one can indistinguishably use p or p 
because the optimization has to be done for all parameters of all 
pixels. 

In a complete parallel formulation, one can work on the 
transformed domain and write the merit function as: 


Na 


xl = 








PIX k=l i=l j=l 


ijk 


(4) 


The previous expression takes into account that the synthetic 
Stokes profiles at pixel k depend now on the full vector q through 
the linear transformation. Therefore, the inversion problem is 
solved by computing 

argmin^q. (5) 

q 

The solutions to Eqs. ^ and ^ are strictly equivalent, with the 
difference that the former is solved in the physical parameters, 
while the latter in the transformed domain. However, at the light 
of the results of Figs. m and[^ Eq. ^ allows us to impose the 
sparsity constrain, so that the regularized solution is obtained by 
computing: 

argmin;k-q, subject to llqllo < i', (6) 

q 

where ||q||o is the ^Q-norm of q, equivalent to counting the num¬ 
ber of non-zero elements. The desired sparsity level is set by 
the upper limit s In other words, we minimize the 

merit function with respect to the modes q but using only a 
very small number of non-zero elements in q. Equivalently, the 
sparsity constrain can be incorporated in the function through a 
Lagrange multiplier, so that Eq. ([^ can be rewritten as: 

argmin^q+/l||q||o. (7) 

q 

with A a parameter that controls the strength of the constraint. 


Data: Stokes profiles and model atmosphere 
Result: ^o-r^gtilarized solution 

Initialization: = 1 and yi = qo, with qo a first estimation 

of the modes; 

while not converged do 

<\k = Hs [y^ - hVq^^Cy^)]; 

4+1 - -2-’ 

jk+i = q/: + (|;7) (q^ - q/:-i); 

if - qk-i) > 0 then 

I qo = yo = ^k, tk = l^ndk = 0; 

end 

end 

return q^^ 

Algorithm 1: FISTA algorithm with restarting strategy. 


where /(q) is a convex and differentiable function, while g(q) 
is another convex function, possible non-smooth. This is exactly 
the type of problem that we have to solve in Eq. (|7]), in our case 
with /(q) =xl and g(q) = d||q||o. 

One of the most successful algorithms developed recently 
belong to the class of methods termed proximal algorithms (e.g., 
|Parikh & Boyd|20f4| ). They can be viewed as the equivalent of 
the Newton method for non-smooth, constrained, and large-scale 
optimization problems. In this case, the solution to the general 
problem ([8]) is given by the following iteration (e.g.,|Starck et al.| 


q,+i = prox^ [q, - hVq/(q,)], 


(9) 


where one carries out a step along the gradient of /(q) and then 
applies the so-called proximity operator, defined in the follow¬ 
ing. Note that, since /(q) = x\ is differentiable, the term inside 
parenthesis is the standard gradient descent method that are typ¬ 
ically used in many inversion codes for the Stokes parameters. 

It is clear from Eq. ([^ that the key ingredient of these proxi¬ 
mal algorithms is the application of the proximity operator of t he 
regularization function g(x), defined as ( [Parikh & Boyd|2014) : 


prox^(x) = arg min 

V 


1 . 


-||v-x||2+Tg(v) 


( 10 ) 


In general, this proximal operator has to be computed numeri¬ 
cally, but exact solutions exist for many interesting regulariza¬ 
tion terms ( [Parikh & Boyd 2014| ). In fact, in our case that we use 
the ^o-norm as regularization, the proximity operator is simply 
given by: 


2.5. Proximal algorithms 

As a consequence of the recent huge increase on the interest 
of compressed sensing, matrix completion, big data and related 
techniques based on the idea of sparsity, several algorithms have 
been developed to optimize the addition of convex functions in¬ 
cluding non-convex constraints (^o or norms, for instance). 
The objective is to solve the following problem: 

argmin/(q) -r g(q), (8) 

q 

^ Despite its technical interpretation, it can also be interpreted as a 
modification of the noise variance associated to each Stokes parameter. 
It is customary to use the same noise variance for all the Stokes param¬ 
eters and then modify them using iv/. 


prox^/x) =//s(x), (11) 

where Hs(x) is the nonlinear hard thresholding operator. This 
operator keeps the 5' elements of x with largest absolute value 
untouched and sets to zero the remaining elements. Although all 
the results presented in this paper use the ^o-norm as regulariza¬ 
tion, sometimes it is interesting to use the ^i-norn|^ In this case, 
the proximal operator is the smooth thresholding operator, which 
is given by: 

prox^^ (x) = sign(x)(|x| - T)+, (12) 

where (•)+ denotes the positive part. 

^ The ^i-norm is given by: ||x||i = 2^- \xi\. 
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Putting together Eqs. ([^ and ( pT] ), the simplest proximal al¬ 
gorithm that uses the gradient descent method to solve Eq. (|7]) is 
given by (e.g., |Parikh & Boyd|2014| ): 

qt+i = Hs [qk - . (13) 

This iteration is just a plain gradient descent algorithm (which 
only makes use of first order derivatives) which is augmented 
by using a hard thresholding projection operator in each itera¬ 
tion. In other words, after moving the solution on the direction 
of the negative gradient (controlled by the step-size h), one sets 
to zero all elements that do not fulfill the sparsity constraint. This 
simple iterative scheme displays a sublinear convergence rate 
0(1 Ik). Recently, a trivial improvement of the gradient descent 
method known as Past Iterative Shrinkage-Thresholding Algo¬ 
rithm (PISTA) has been developed by [Beck & Teboulle] ([2009]) 
showing a quadratic convergence rate 0(11k^). Algorithm a 
combination of PISTA and the restarting scheme developed by 
|Q’Donoghue & Candesj p012| ), is our method of choice in this 
paper. We leave for the future the analysis of algorithms that 
smartly introduce some second-order information without ever 
constructing the Hessian matrix (e.g., [Becker & Padili|2012|). 

Eq. ( p^ shows that the optimization method relies on the 
computation of the gradient of the merit function with respect to 
the modes q, that we have made explicity by using the subindex 
q on Given the linear character of the transformation of 

Eq. Q, this gradient can be trivially related to the gradient with 
respect to the physical parameters (the response functions), that 
can be analitically written as: 


= W-‘Vp,.V 




par- 


(14) 


The method described in this section is independent of the 
specific details of the transform W. The only condition is that it 
has to be linear and invertible. Apart from that, it is also desir¬ 
able from the computational point of view that the linear trans¬ 
form has a fast algorithm available. This is the case both for the 
orthogonal wavelet and DCT transforms. 


2.6. Possible improvements 

Even though gradient descent methods are known to be fast 
when approaching the minimum, they become slower in the re¬ 
finement phase of the solution (that is precisely the reason why 
the Levenberg-Marquardt method is a combination of a gradient 
descent method and a Newton method, controlled by the Hes¬ 
sian). However, the non-convex optimization of large-scale prob¬ 
lems has to rely on first-order derivatives because the calculation 
and storage of the Hessian is impractical. In our experience and 
that of many others in the literature, the EISTA algorithm is a 
competitive optimization technique. To achieve a good conver¬ 
gence speed, the vector of step-sizes h has to be smartly chosen. 
Extensive experiments have convinced us that these steps can 
be kept fixed and used in different data sets without any special 
impact on the convergence speed. However, we are currently in¬ 
vestigating the possibility of improving the convergence speed 
along two lines. Eirst, using approximations to the diagonal Hes¬ 
sian matrix that can be efficiently computed using quasi-Newton 
methods. Quasi-Newton methods update the Hessian matrix by 
analyzing successive gradient vectors using a generalization of 
the secant method. We anticipate that the methods developed by 
[Becker & Eadili ( [2012[ ) or [Marjugi & Leong[ ( [2013[ ) can be of 
help. Second, using conjugate gradient methods. 

We have noticed that convergence is greatly improved if the 
inversion is started forcing a very sparse solution, which in prac¬ 
tice is similar to reducing the number of parameters in the first 
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steps ( [Ruiz Cobo & del Toro Iniestail992[ ). Once the model can¬ 
not be further improved, the inversion can be re-started assum¬ 
ing the final target sparsity, and therefore, giving more freedom 
to the inversion in the final steps. In fact, we have noticed that 
in cases where the sparsity is assumed to be low, the inversion 
can get stuck when;^^ is still very high. We blame the EISTA al¬ 
gorithm with fixed h for this behaviour. We expect to ameliorate 
this behaviour with the future improvements that are described 
above. 


2.7. Summary 

The inversion scheme that we propose in this paper works as 
follows. A linear transform is chosen so that the transformed 2D 
maps of model parameters are as sparse as possible. We have 
tested that DCT and wavelet transforms seem to be appropriate. 
At the beginning of the iterative process, all the model param¬ 
eters are initialized using any appropriate initialization like any 
standard inversion code. The initial solution is transformed to 
obtain the first estimation of the modes. At each iteration, the 
value of the merit function and of its gradient with respect to 
the model parameters taking into account all observed pixels are 
computed. To this end, the inverse transform of the current solu¬ 
tion has to be computed. The gradients, reordered as 2D maps, 
are transformed using Eq. Once the gradient descent step 
is applied, the sparsity constrain is applied by hard thresholding 
the resulting new modes. In this somehow trivial step is where 
the sparsity condition is applied. Einally, algorithm is repeated 
until convergence. 


3. The computer code 

We have developed a computer code using the ideas presented 
in the previous section. The code is written in C-I--I- using the 
Message Passing Interface (MPI) standard for parallelization us¬ 
ing a master-slave strategy. One node works as a master, sending 
calculations to the slaves and receiving the results. Although the 
architecture of the code is very general, the current version of the 
code only works with the ME model atmosphere. It can be used 
to carry out pixel-by-pixel inversions (using the EM algorithm) 
and also sparsity regularized two-dimensional inversions (using 
the EISTA algorithm). We will provide extensive details on the 
implementation in a near future, once the code is expanded to 
use more realistic cases than the ME included in this paper (de 
la Cruz Rodriguez et al. in prep.). Additionally, we will make the 
code freely available. 

The code starts by using an initial estimation of the trans¬ 
formed coefficients for the maps of parameters. With this initial¬ 
ization, the computationally heavy part of the calculation of the 

and the gradient term V^x^ is fully parallelized, with each 
slave working on a piece of the map. When all the information is 
gathered by the master node, it computes the gradient term in the 
transformed domain by direct application of Eq. ([g. This are 
the needed quantities to evaluate one iteration of the EISTA al¬ 
gorithm and produce a correction of the transformed coefficients 
of the maps of physical quantities. This very same procedure is 
iterated until convergence. 

The code also allows the user to specify a PSE that is used for 
degrading the synthetic profiles and carry out the spatial decon¬ 
volution and the regularized inversion simultaneously, similar to 
the approach followed by [van Noort| \2bl2) . The inclusion of 
the deconvolution is straightforward in our first-order approach. 
The convolution operator is linear so that, apart from convolving 
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Fig. 4. Inverted maps for some of the most relevant physical parameters of the model for a quiet Sun observation. The map is 512x512 pixels. The 
left column shows the pixel-by-pixel results, while the central panels display the results of the sparsity regularized inversion, using a Daubechies-8 
orthogonal wavelet and a thresholding value of 50%. The third column corresponds to a sparsity regularized inversion compensating for the Hinode 
telescope PSF. 
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Pixel-by-pixel Sparse 



Fig. 5. Value of the for each Stokes parameter for the pixel-by-pixel 
inversion (left panels) and the sparsity regularized inversion (right pan¬ 
els). 


the Stokes profiles with the PSF, we only need to convolve the 
response functions with the PSF, so that Eq. ( p^ becomes 

= P * fw-i Vp,;^r2j ^ / = 1,..., (15) 

where * is the convolution operator. 

4. Illustrative examples 

As a demonstration of our approach, we present in Fig. the 
inverted maps for some physical parameters of the ME model 
in the quiet Sun map. Since the inversion is carried out with a 
very simple model with only one component that is not able to 
capture the behavior of unresolved fields, we cannot infer much 
from the magnetic properties on the map. However, a few things 
need to be commented: 

1. The pixel-to-pixel inversion has been carried out in a multi- 
step process to improve the quality of the fit. First, we ob¬ 
tained a result from 50 randomizations of the initial parame¬ 
ters, keeping only the best solution for each pixel. Then we 
smooth the map with a Gaussian kernel. Finally, we re-run 
the inversion with another 50 randomizations, starting from 
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Fig. 7. Convergence of x^ as a function of iteration number for the two 
data sets. 


the smoothed maps. We keep the solution that produces the 
best fit. 

2. The general appearance of the maps is much cleaner for the 
sparsity regularized maps, a consequence of the threshold¬ 
ing. This avoids arbitrary pixel-to-pixel variations because 
some spatial coherence is forced. 

3. The map of Doppler velocities are indistinguishable for the 
two inversions, meaning that this quantity is very easy to find 
in a pixel-by-pixel inversion. 

4. The spatially regularized inversion returns lots of fields in¬ 
clined by -90°. This is a direct consequence of the fact that 
many pixels have very low signal-to-noise ratio (SNR). The 
bias produced by the noise on the transversal component of 
the magnetic field ( [Martinez Gonzalez et al.|[2012| ) makes 
that the fields systematically appear very inclined. 

5. We also find that the Doppler width of the line is larger in 
the granules than in the intergranules on the sparsity regu¬ 
larized solution. The opposite is found in the pixel-by-pixel 
inversion. This behavior is compensated in the sparsity reg¬ 
ularized solution by an small increase of the magnetic field. 

6. The maps in the third column of Fig. show the effect of 
compensating for the telescope PSF. Features become more 


conspicuous and compact, as demonstrated by [van Noort 
( |2012j ) and |Ruiz Cobo & Asensio Ramos| ( |2013] ). 


The quality of the fits is assessed by computing the asso¬ 
ciated x^ for each Stokes parameter and shown in Fig. Curi¬ 
ously, the sparsity regularized inversion is comparable for Stokes 
2, U and V and much better for Stokes /, because the sparse 
solution does not show all the structuring visible in the pixel- 
by-pixel one. This demonstrates that our inversion code finds a 
much simpler model (in terms of the number of free parameters) 
that explains the data better. 

Figurej^shows the results for the sunspot data. Again, the in¬ 
ferred Doppler velocities are very similar in the two approaches, 
giving robustness to this quantity. The same behavior is also seen 
in the Doppler width of the line, with granules presenting spec¬ 
tral lines with an enhanced broadening as compared with the in¬ 
tergranular lanes. The same applies to the penumbra, where we 
find dark filaments to have a smaller Doppler width. Concerning 
the magnetic field vector, we find smoother smoother and much 
less noisy results. 

Concerning the convergence speed. Fig. [^displays the value 
of the reduced x^ in the whole map and taking into account all 
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Fig. 6. Same as Fig.j^but for the sunspot data. 
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Stokes parameters for the two datasets. As usual in first-order, 
the initial convergence is very fast, while the refinement of the 
solution is slower. We note that artifacts can appear in the sparse 
inversion when the solution is not fully converged. Unlike the 
pixel-by-pixel case, these artifacts appear as spatially coherent 
patches in the solution, that deviate from the surroundings. When 
the solution is converged, these patches disappear. Therefore, 
a different intuition is needed to assess whether the solution is 
physically meaningful or not. 


Finally, Fig. [^illustrates the smearing effect of the PSF in 
the observed data. We have not noticed the oscillatory behavior 
that was reported by |van Noort ( 2012| ). We speculate that the 
regularization present in our sparse inversion naturally damps 
that behavior, but to be fair a ME model is simp ler than the 
depth-dependent case consider by [van Noort| ( |2012| ) and this is¬ 
sue should be addressed under the same conditions. 


5. Conclusion and outlook for the future 

This paper presents the first full 2D regularized inversion code 
for Stokes profiles based on the idea of sparsity. Although the 
atmosphere model that we use is very simple, it represents the 
first step toward a full suite of inversion codes. The strong spatial 
regularization that we introduce thanks to the sparsity constrain 
allows us to obtain very reliable results. 

Our method is built upon some of the ideas presented by 
[van Noort] ( |2012| ) and |Ruiz Cobo & Asensio Ramos| ( |2013] ): the 
model is spatially coupled and regularized. Furthermore, given 
that we do not use a Hessian based inversion method, compen¬ 
sating the data for telescope or stray-light PSF is trivial, as it 
does not add extra complexity to the code, beyond the actual 
computation of convolutions. However, if the telescope PSF is 
not known, the sparse regularization will also couple the param¬ 
eters from neighboring pixels. The intrinsic nature of the wavelet 
transform allows to reproduce small-scale and localized features, 
as well as the large scales and therefore, reducing the number 
of free parameters does not necessarily preclude the code from 
reproducing gradients and sharp features if the correct sparsity 
level is chosen. 

In this study we have used the first-order accelerated FISTA 
optimization algorithm to perform the 2D sparse inversion. We 
emphasize that we are exploring alternative methods to further 
improve the convergence of the inversion. 

A fast and parallel inversion code implementing everything 
that is presented in this work and some extensions as described 
in the following will be freely available for the community soon. 
The code will be extensible if the user provides the emergent 
Stokes profiles from a given parameterized atmosphere and the 
response function of the Stokes parameters to all parameters. 
This response function can be computed analytically or, in gen¬ 
eral, numerically. 

The concept of sparse regularization can be extended to 
many other problems that we plan to analyze in this series of 
papers, not necessarily maintaining the order. In a first step, we 
plan to extend the sparse regularization to atmospheres with gra¬ 
dients in the vertical direction. This will be done by fixing a set 
of heights in a common optical depth scale (for instance, the op¬ 
tical depth scale at 500 nm) that will play the role of nodes, in 
parallel to which is currently done in standard inversion codes 
like SIR, Nicole or SPINOR. The physical parameters (tempera¬ 
ture, velocity, magnetic field, etc.) at each node will be assumed 
to be sparse in a wavelet basis and the strategy used in this paper 
will be applied to fit the emergent Stokes profiles. 
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In a second step, the concept of nodes will be surpassed. 
Each physical parameter will be determined by a 3D cube that 
will be assumed to be sparse in a three-dimensional basis (DCT 
and orthogonal wavelets are trivially extended to the three di¬ 
mensional case). Therefore, the inversion will proceed by infer¬ 
ring a reduced number of wavelet modes for each physical pa¬ 
rameter. The regularization imposed in this 3D inversion is very 
strong because the observation of a single pixel provides (proba¬ 
bly little) information for constraining the depth stratification of 
all the pixels in the cube. 

In a third step, standard pixel-by-pixel inversions can be car¬ 
ried out using a sparse regularization in height. The concept of 
nodes, that have been present in all inversion codes with depth 
stratification looses its meaning again. The depth stratification 
of the physical parameters of interest will be expanded in a suit¬ 
able dictionary of depth-dependent functions (we use the term 
dictionary because the functions do not need, in principle, to be 
orthogonal) and the optimization will select the smallest number 
of functions needed to reproduce the Stokes profiles. 

Finally, we note that our strategy could potentially be ex¬ 
tended to do regularized superresolution. We leave this analysis 
for the future because it needs to be studied in detail. 
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Fig. 8. Illustration of the effect of the telescope PSF in the data. The FOV is divided in two halfs for a monochromatic image at AT = 91 mA from 
line center in Stokes I (left) and V (right). The lower half shows the observations, without applying the PSF compensation. The upper half shows 
the synthetic image from the inversion, including the PSF compensation. This inversion has been performed assuming 50% sparsity. 
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